智能论文笔记

Semantics-Empowered Communication: A Tutorial-cum-Survey

Zhilin Lu , Rongpeng Li , Kun Lu , Xianfu Chen , Ekram Hossain , Zhifeng Zhao , Honggang Zhang

分类：人工智能

2022-12-16

Along with the springing up of semantics-empowered communication (SemCom) researches, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present corresponding ecosystems, including theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content \& channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., reliable and goal-oriented communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate the future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.

translated by 谷歌翻译

JSRNN: Joint Sampling and Reconstruction Neural Networks for High Quality Image Compressed Sensing

Chunyan Zeng , Jiaxiang Ye , Zhifeng Wang , Nan Zhao , Minghu Wu

分类：计算机视觉

2022-11-11

Most Deep Learning (DL) based Compressed Sensing (DCS) algorithms adopt a single neural network for signal reconstruction, and fail to jointly consider the influences of the sampling operation for reconstruction. In this paper, we propose unified framework, which jointly considers the sampling and reconstruction process for image compressive sensing based on well-designed cascade neural networks. Two sub-networks, which are the sampling sub-network and the reconstruction sub-network, are included in the proposed framework. In the sampling sub-network, an adaptive full connected layer instead of the traditional random matrix is used to mimic the sampling operator. In the reconstruction sub-network, a cascade network combining stacked denoising autoencoder (SDA) and convolutional neural network (CNN) is designed to reconstruct signals. The SDA is used to solve the signal mapping problem and the signals are initially reconstructed. Furthermore, CNN is used to fully recover the structure and texture features of the image to obtain better reconstruction performance. Extensive experiments show that this framework outperforms many other state-of-the-art methods, especially at low sampling rates.

translated by 谷歌翻译

Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning

Xianfu Chen , Zhifeng Zhao , Shiwen Mao , Celimuge Wu , Honggang Zhang , Mehdi Bennis

分类：人工智能 | 机器学习

2022-09-19

信息指标的年龄无法正确描述状态更新的内在语义。在一个智能反映表面上的合作中继通信系统中，我们提出了语义年龄（AOS），用于测量状态更新的语义新鲜度。具体而言，我们专注于从源节点（SN）到目标的状态更新，该状态被称为马尔可夫决策过程（MDP）。 SN的目的是在最大发射功率约束下最大程度地提高AOS和能源消耗的预期满意度。为了寻求最佳的控制政策，我们首先在派利时间差异学习框架下推出了在线深层演员批评（DAC）学习方案。但是，实践实施在线DAC在SN和系统之间无限重复的互动中构成了关键的挑战，这可能是危险的，尤其是在探索过程中。然后，我们提出了一个新颖的离线DAC方案，该方案估算了先前收集的数据集的最佳控制策略，而无需与系统进行任何进一步的交互。数值实验验证了理论结果，并表明我们的离线DAC方案在平均效用方面显着优于在线DAC方案和最具代表性的基线，这表明了对数据集质量的强大鲁棒性。

translated by 谷歌翻译

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

Chunyan Zeng , Shixiong Feng , Zhifeng Wang , Xiangkui Wan , Yunfan Chen , Nan Zhao

分类：人工智能

2022-08-25

现有的源单元手机识别方法缺乏源设备的长期特征表征，从而导致与源单元相关特征的不准确表示，从而导致识别精度不足。在本文中，我们提出了一种基于时空表示学习的源细胞手机识别方法，其中包括两个主要部分：提取顺序高斯平均矩阵特征和基于时空表示学习的识别模型的构建。在特征提取部分中，基于对记录源信号的时间序列表示的分析，我们通过使用高斯混合模型对数据分布的灵敏度提取具有长期和短期表示能力的顺序高斯平均矩阵。在模型构建部分中，我们设计了一个结构化的时空表示网络C3D-BILSTM，以充分表征时空信息，结合3D卷积网络和双向长期短期记忆网络，用于短期光谱信息和长期的长期记忆网络波动信息表示学习，并通过融合记录源信号的时空特征信息来准确识别细胞手机。该方法的平均准确性为99.03％的封闭设置识别在CCNU \ _Mobile数据集中的45个手机识别，而在小样本尺寸实验中的平均识别率为98.18％，识别性能优于现有的最新目前的识别性能方法。实验结果表明，该方法在多级细胞手机识别中表现出出色的识别性能。

translated by 谷歌翻译

AoI-based Temporal Attention Graph Neural Network for Popularity Prediction and Content Caching

Jianhang Zhu , Rongpeng Li , Guoru Ding , Chan Wang , Jianjun Wu , Zhifeng Zhao , Honggang Zhang

分类：机器学习

2022-08-18

随着网络技术的快速发展和网络设备的快速增长，数据吞吐量也大大增加。为了解决蜂窝网络中回程瓶颈的问题并满足人们对延迟的要求，基于预测的结果，网络体系结构等网络体系结构旨在主动将有限的流行内容保持在网络边缘。同时，内容（例如，深度神经网络模型，与Wikipedia类似知识库）和用户之间的相互作用可以视为动态二分图。在本文中，为了最大程度地提高缓存命中率，我们利用有效的动态图神经网络（DGNN）共同学习嵌入了两部分图中的结构和时间模式。此外，为了更深入地了解不断发展的图表中的动态，我们提出了一个基于信息时代（AOI）的注意机制，以提取有价值的历史信息，同时避免消息陈旧的问题。结合了上述预测模型，我们还开发了一种缓存选择算法，以根据预测结果做出缓存决策。广泛的结果表明，与两个现实世界数据集中的其他最先进的方案相比，我们的模型可以获得更高的预测准确性。命中率的结果进一步验证了基于我们提出的模型而不是其他传统方式的缓存政策的优势。

translated by 谷歌翻译

N-Grammer: Augmenting Transformers with latent n-grams

Aurko Roy , Rohan Anil , Guangda Lai , Benjamin Lee , Jeffrey Zhao , Shuyuan Zhang , Shibo Wang , Ye Zhang , Shen Wu , Rigel Swavely

分类：自然语言处理 | 机器学习

2022-07-13

变压器模型最近已成为自然语言处理中的基础模型之一，作为副产品，最近对扩展这些模型具有重大的兴趣和投资。但是，这些大型变压器语言模型的培训和推理成本令人难以置信，因此需要更多的研究来识别更有效的变体。在这项工作中，我们通过用统计语言建模中的文献启发的变压器体系结构提出了一个简单而有效的修改，该架构是通过通过文本序列的离散潜在表示构建的n-grams来增强模型的。我们评估了我们的模型，关于C4数据集的语言建模的N-Strammer以及Superglue数据集的文本分类，并发现它的表现优于诸如变压器和底漆等几个强基线。我们为JAX中的可重复性目的开放源模型。

translated by 谷歌翻译

Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Chunyan Zeng , Kang Yan , Zhifeng Wang , Yan Yu , Shiyan Xia , Nan Zhao

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-08

深神经网络（DNN）的黑盒性质严重阻碍了其在特定场景中的性能改善和应用。近年来，基于类激活映射的方法已被广泛用于解释计算机视觉任务中模型的内部决策。但是，当此方法使用反向传播获得梯度时，它将在显着图中引起噪声，甚至找到与决策无关的特征。在本文中，我们提出了一个基于绝对价值类激活映射（ABS-CAM）方法，该方法优化了从反向传播中得出的梯度，并将所有这些梯度变成正梯度，以增强输出神经元激活的视觉特征，并改善。显着图的本地化能力。 ABS-CAM的框架分为两个阶段：生成初始显着性图并生成最终显着图。第一阶段通过优化梯度来提高显着性图的定位能力，第二阶段将初始显着性图与原始图像线性结合在一起，以增强显着性图的语义信息。我们对拟议方法进行定性和定量评估，包括删除，插入和指向游戏。实验结果表明，ABS-CAM显然可以消除显着性图中的噪声，并且可以更好地定位与决策相关的功能，并且优于以前的识别和定位任务中的方法。

translated by 谷歌翻译

Semantic Communication with Adaptive Universal Transformer

Qingyang Zhou , Rongpeng Li , Zhifeng Zhao , Chenghui Peng , Honggang Zhang

分类：自然语言处理

2021-08-20

随着深度学习（DL）的发展，自然语言处理（NLP）使我们可以分析和理解大量语言文本。因此，在NLP的帮助下，我们可以在联合语义源和噪声频道上进行联合语义源和信道进行语义通信。然而，实现这一目标的现有方法是使用NLP的固定变压器，同时忽略每个句子中包含的语义信息的差异。为了解决这个问题，我们提出了一种基于通用变压器的新语义通信系统。与传统变压器相比，在通用变压器中引入了自适应循环机制。通过引入循环机制，新的语义通信系统可以更灵活地传输具有不同语义信息的句子，并在各种信道条件下实现更好的端到端性能。

translated by 谷歌翻译

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong , Wei Ping , Jiaji Huang , Kexin Zhao , Bryan Catanzaro

分类：

2020-09-21

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audio in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations. 1 * Contributed to the work during an internship at Baidu Research, USA. 1 Audio samples are in: https://diffwave-demo.github.io/

translated by 谷歌翻译

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

Pei Sun , Henrik Kretzschmar , Xerxes Dotiwalla , Aurelien Chouard , Vijaysai Patnaik , Paul Tsui , James Guo , Yin Zhou , Yuning Chai , Benjamin Caine

分类：

2019-12-10

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

translated by 谷歌翻译